The following analysis is parallel to the analysis performed in the manuscript, however it is preceded by the non-linear transformation steps outlined below.

Non-linear transformation evaluation

Assessing degrees for splines

We began by evaluating the use of cubic splines with a penalty score encouraging monotonicity using a general additive model in the mgcv package. We iteratively increased the number of degrees from 1-5, and evaluated the fits using AIC to select the strongest model for each landscape, then logged the number of degrees that should be used for each transform.

Unique_ID Best_Degree Best_AIC
AP_catef_1 4 79.375299
DHFR_ic50_c57 1 22.838106
DHFR_ic50_c58 1 29.992922
DHFR_ic50_c59 4 18.441344
DHFR_ic50_c60 5 24.490984
DHFR_ic50_c61 5 9.666351
DHFR_ic75_palmer 5 197.681499
DHFR_kcat_trajg 4 6.681135
DHFR_kcat_trajr 5 -3.673020
DHFR_ki_trajg 5 18.610525
DHFR_ki_trajr 4 6.720748
MPH_catact_CaPTM 1 35.963443
MPH_catact_CdPTM 1 -11.111711
MPH_catact_CoPTM 1 24.892223
MPH_catact_CuPTM 1 23.676399
MPH_catact_MgPTM 1 26.680008
MPH_catact_MnPTM 1 11.392128
MPH_catact_NiPTM 1 4.306652
MPH_catact_ZnPTM 4 46.094248
NfsA_ec50_2039 5 -101.318022
NfsA_ec50_3637 5 -65.972370
OXA-48_ic50_CAZtraj1 5 -32.759419
OXA-48_ic50_CAZtraj2 4 -26.712493
OXA-48_ic50_CAZtraj3 1 -99.376569
OXA-48_ic50_PIPtraj1 5 6.216836
OXA-48_ic50_PIPtraj2 5 28.085051
OXA-48_ic50_PIPtraj3 4 -20.227528
PTE_catact_2NH 5 54.866422
PTE_catact_butyrate 5 -15.140782
TEM_MIC_weinreich 5 39.737284
TEM_growth_AM 4 13.245353
TEM_growth_AMC 4 7.064258
TEM_growth_AMP 5 42.425104
TEM_growth_CAZ 1 40.127210
TEM_growth_CEC 1 39.052728
TEM_growth_CPD 5 32.778424
TEM_growth_CPR 1 30.292993
TEM_growth_CRO 5 40.952662
TEM_growth_CTT 1 45.810669
TEM_growth_CTX 5 36.284359
TEM_growth_CXM 4 23.141110
TEM_growth_FEP 1 26.596062
TEM_growth_SAM 5 22.150169
TEM_growth_TZP 5 21.309300
TEM_growth_ZOX 5 23.862908

Visual spline model evaluation

We then visually evaluated the cubic spline transformations (in blue) for each landscape with the x-axix representing additive effects, and y axis representing observed effects.

From the fits we notice a certain degree of overfitting. One approach to alleviate this would be to arbitrarily lower the number of degrees, however instead we opted to use a different transformation method which is less prone to overfitting.


Evaluation of four parameter transformations

The drawback of using monotonic splies (or even power transforms), as can be seen from the plots above, is the lack of bounding. Bounding is crucial to avoid greatly transforming phenotype values that lay outside of the bounds of the transform, which itself is constrained by the range of the predicted first-order effects.

In other words, phenotypes with magnitudes that lay outside of the range of predicted values by the first-order model are at risk of being incorrectly transformed. Though this is somewhat true for a four-parameter model, the bounded upper- and lower- thresholds ensure that phenotypes that lay outside of the predicted range will be approximated more accurately, i.e., the transformation will be lower in magnitude than other models.

Four parameter function transform

We attempted to transform all datasets using the four-parameter function, with fitting performed by non-linear least squares regression using nlsLM. The fits are compared to a simple linear model fit using an AIC to determine whether the additional information stemming from the four-parameter transform is parsimonious, otherwise, no transform is applied.

## Error in nlsModel(formula, mf, start, wts) : 
##   singular gradient matrix at initial parameter estimates

## Error in nlsModel(formula, mf, start, wts) : 
##   singular gradient matrix at initial parameter estimates

Indeed, these fits appear much better visually than the spline fits, as they avoid overfitting to datasets. We used these transforms in the subsequent analyses. Note: we removed landscapes TEM_growth_AMP, TEM_growth_AMC, TEM_growth_AMP, and TEM_growth_CAZ, as the four-parameter model was more parsimonous than the linear model, however the fits reduced all values in the landscape to binary values that represented the upper or lower bounds of the four-parameter model.


Heterogeneity in single mutational effects (SMEs)

The spread of functional contributions of the positions. This shows whether introducing a mutation in a given position impacts the function positively or negatively (or neutral) across all backgrounds

## Warning: Removed 7 rows containing non-finite values (`stat_bin()`).

Spread in SMEs - 2SD

We have 198 positions.

Histogram of 2SD - SMEs
## Warning: Removed 4 rows containing non-finite values (`stat_bin()`).
## Warning: Removed 2 rows containing missing values (`geom_bar()`).

Table of 2SD - SMEs

First order positions which have a heterogeneity index > log10 1.5-fold, 2-fold, 5-fold, and 10-fold for adaptive trajectories

mutations idiosync_sig_1.5 idiosync_sig_2 idiosync_sig_5 idiosync_sig_10 idiosync_total idiosync_1.5 idiosync_2 idiosync_5 idiosync_10
Order 1 189 184 141 110 198 95.45455 92.92929 71.21212 55.55556

Spread in SMEs - sign

We assign positive, neutral, negative, negative-neutral, positive-neutral, and negative-positive ‘types’ based on log10 1.5-fold threshold to every order

Piechart of sign - SMEs

Note, the aesthetics of the pie charts were edited elsewhere for publication.

Table of sign - SMEs and EEs

Raw Values

## `mutate_all()` ignored the following grouping variables:
## • Column `mutations`
## ℹ Use `mutate_at(df, vars(-group_cols()), myoperation)` to silence the message.
mutations Negative Neutral Neutral Negative Neutral Positive Positive Positive Negative Total
Order 1 3 5 34 44 20 92 198
Order 2 3 11 44 68 10 259 395
Order 3 12 13 51 58 37 251 422
Order 4 14 3 47 38 8 135 245
Order 5 8 3 12 8 9 44 84
Order 6 0 0 3 5 1 5 14

Percentages

mutations Negative Neutral Neutral Negative Neutral Positive Positive Positive Negative Total
Order 1 1.5151515 2.525253 17.17172 22.22222 10.101010 46.46465 198
Order 2 0.7594937 2.784810 11.13924 17.21519 2.531646 65.56962 395
Order 3 2.8436019 3.080569 12.08531 13.74408 8.767772 59.47867 422
Order 4 5.7142857 1.224490 19.18367 15.51020 3.265306 55.10204 245
Order 5 9.5238095 3.571429 14.28571 9.52381 10.714286 52.38095 84
Order 6 0.0000000 0.000000 21.42857 35.71429 7.142857 35.71429 14

Spread in SMEs - wt/mean dev

The purpose of this analysis was to determine the difference in SMEs between the WT-background contribution of a given position/combination and the mean contribution of a given position or combination

Histogram of wt/mean dev - SMEs

In the manuscript, we also briefly mentioned how many WT SMEs have sign deviation (not just magnitude) from the mean at the 1st order. This is in fact 1.01% (2/198).

Table of wt/mean dev - SMEs

Also, in the supplementary of our paper, we show this table to show percentages of WT-bg points that deviate from the positional/combinatorial mean by multiple significance threshold (log10 1.5-, 2-, 5-, and 10-fold)

Order percent_1.5 outlier_1.5 percent_2 outlier_2 percent_5 outlier_5 percent_10 outlier_10 total
1 66.7 132 52.5 104 24.2 48 11.6 23 198
2 59.5 235 47.8 189 20.5 81 9.6 38 395
3 51.7 218 34.4 145 19.0 80 12.6 53 422
4 60.4 148 40.0 98 16.3 40 11.0 27 245

As well as the devation of every single SME and EE from the mean, instead of just the wt background.

Table of all SME and EE to mean dev
Order percent_1.5 outlier_1.5 percent_2 outlier_2 percent_5 outlier_5 percent_10 outlier_10 total
1 0.5800305 2283 0.4153963 1635 0.1572663 619 0.0884146 348 3936
2 0.5148601 2356 0.3529283 1615 0.1440122 659 0.0804196 368 4576
3 0.5091712 1499 0.3383152 996 0.1348505 397 0.0889946 262 2944
4 0.6017857 674 0.4062500 455 0.1187500 133 0.0758929 85 1120

Heterogeneity in epistatic effects

Spread in EEs - 2SD

Histogram to show heterogenity at Orders 2, 3, and 4 with the log10 1.5-fold significance threshold

Histogram of 2SD - pairwise
## Warning: Removed 6 rows containing non-finite values (`stat_bin()`).
## Warning: Removed 2 rows containing missing values (`geom_bar()`).

Histogram of 2SD - three-way
## Warning: Removed 4 rows containing non-finite values (`stat_bin()`).
## Warning: Removed 2 rows containing missing values (`geom_bar()`).

Histogram of 2SD - four-way
## Warning: Removed 5 rows containing non-finite values (`stat_bin()`).
## Warning: Removed 2 rows containing missing values (`geom_bar()`).

Table of 2SD in all EEs - Extended data table 1

What percentage of positions remain significantly heterogeneous (1.5-, 2-, 5-, and 10-fold) at all orders?

mutations idiosync_sig_1.5 idiosync_sig_2 idiosync_sig_5 idiosync_sig_10 idiosync_total idiosync_1.5 idiosync_2 idiosync_5 idiosync_10
Order 1 189 184 141 110 198 95.45455 92.92929 71.21212 55.55556
Order 2 376 351 229 172 395 95.18987 88.86076 57.97468 43.54430
Order 3 397 359 206 147 422 94.07583 85.07109 48.81517 34.83412
Order 4 244 225 123 79 245 99.59184 91.83673 50.20408 32.24490

Spread in EEs - sign

Piechart of sign - EE pairiwise

Piechart of sign - EE three-way

Piechart of sign - EE four-way

Spread in EEs - wt/mean dev

Histogram of wt/mean dev - pairwise EEs
## Warning: Removed 3 rows containing non-finite values (`stat_bin()`).
## Warning: Removed 2 rows containing missing values (`geom_bar()`).

Histogram of wt/mean dev - three-way EEs
## Warning: Removed 6 rows containing non-finite values (`stat_bin()`).
## Warning: Removed 2 rows containing missing values (`geom_bar()`).

Histogram of wt/mean dev - four-way EEs
## Warning: Removed 5 rows containing non-finite values (`stat_bin()`).
## Warning: Removed 2 rows containing missing values (`geom_bar()`).

Like the SMEs, we checked whether there is a sign deviation between the wt EE and mean EE at each order of interaction. For pairwise we saw 37/395 (9.37%). For three-way we saw 26/422 (6.16%). For four-way we saw 17/245 (6.94%).

Prediction of end-points

Import the absolute error files for each data set

How well does each model predict at each order

Step wtbg_mean wtbg_median lin_mean lin_median
1 33.01911 9.246631 2.822502 2.175932
2 95.83797 48.281320 1.969366 1.706986
3 53.33004 6.599577 1.314896 1.183863
4 39.87798 6.595830 1.088091 1.048122
5 26.80878 4.350230 1.055684 1.024498
6 6.66224 6.662240 1.014927 1.014927

Are the means significantly less than 1.5-fold?

Step wtbg_pval lin_mean
1 0.9999993 0.9999859
2 1.0000000 0.9923349
3 0.9999835 0.0051152
4 0.9970087 0.0000000

This plot omits some data from being viewed, however plotted means and medians are accurate. The figure is edited outside of the code for aesthetics.

Prediction of intermediates

Instead of predicting end point variants, we can look at how the biochemical model predicts variants along the most accessible trajectory by monitoring AE for the predicted function of each intermediate. The x-axis represent the order of the intermediate mutant that is being predicted along the most accessible trajectory.

Interestingly, for the trajectories that were transformed by the non-linear transform (both DHFR_ki trajectories, NfsA trajectories, OXA trajectories, PTE, and TEM) the previous finding of the lower model often being better at prediction is alleviated. Hence, it appears it was mostly non-specific epistasis that was causing the large error in prediction of incorporating apparent idiosyncratic epistasis into the model.

Network rewiring

First we load the required data for network rewiring exploration, as well as establish filters for adaptive trajectories and ‘mechanistic’ trajectories.

Then we look at how many network rewiring events there are.

epistasis at lower-order? epistasis at higher-order? n percentage
FALSE FALSE 242 23.56378
FALSE TRUE 251 24.44012
TRUE FALSE 147 14.31353
TRUE TRUE 387 37.68257

Of the rewired, how many are constructive and how many are disruptive?

event_type n percentage
constructive 93 24.03101
disruptive 294 75.96899

Molecular basis

This analysis was bespoke, but surveying all network rewiring events manually and selecting genotypes along adaptive trajectories.